Whitening Transformation
   HOME

TheInfoList



OR:

A whitening transformation or sphering transformation is a
linear transformation In mathematics, and more specifically in linear algebra, a linear map (also called a linear mapping, linear transformation, vector space homomorphism, or in some contexts linear function) is a mapping V \to W between two vector spaces that pre ...
that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are
uncorrelated In probability theory and statistics, two real-valued random variables, X, Y, are said to be uncorrelated if their covariance, \operatorname ,Y= \operatorname Y- \operatorname \operatorname /math>, is zero. If two variables are uncorrelated, ther ...
and each have
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
1. The transformation is called "whitening" because it changes the input vector into a
white noise In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines ...
vector. Several other transformations are closely related to whitening: # the decorrelation transform removes only the correlations but leaves variances intact, # the standardization transform sets variances to 1 but leaves correlations intact, # a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.


Definition

Suppose X is a random (column) vector with non-singular covariance matrix \Sigma and mean 0. Then the transformation Y = W X with a whitening matrix W satisfying the condition W^\mathrm W = \Sigma^ yields the whitened random vector Y with unit diagonal covariance. There are infinitely many possible whitening matrices W that all satisfy the above condition. Commonly used choices are W = \Sigma^ (Mahalanobis or ZCA whitening), W = L^T where L is the
Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
of \Sigma^ (Cholesky whitening), or the eigen-system of \Sigma (PCA whitening). Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of X and Y. For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original X and whitened Y is produced by the whitening matrix W = P^ V^ where P is the correlation matrix and V the variance matrix.


Whitening a data matrix

Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
) and subsequently constructing a corresponding estimated whitening matrix (e.g. by
Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effici ...
).


R implementation

An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package published on CRAN.


See also

* Decorrelation * Principal component analysis *
Weighted least squares Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a speci ...
*
Canonical correlation In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y' ...
* Mahalanobis distance (is Euclidean after W. transformation).


References

{{reflist


External links

* http://courses.media.mit.edu/2010fall/mas622j/whiten.pdf
The ZCA whitening transformation
Appendix A of ''Learning Multiple Layers of Features from Tiny Images'' by A. Krizhevsky. Classification algorithms Articles with example Python (programming language) code